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(57) ABSTRACT 

A self-stabilizing network in the fonn of an arbitrary, non- 
partitioned digraph includes K nodes having a synchronizer 
executing a protocol. K- 1 monitors of each node may receive 
a Sync message transmitted from a directly connected node. 
When the Sync message is received, the logical clock value 
for the receiving node is set to between 0 and a communica- 
tion latency value (y) if the clock value is less than a minimum 
event -response delay (D). A new Sync message is also trans- 
mitted to any directly connected nodes if the clock value is 
greater than or equal to both D and a graph threshold (T 5 ). 
When the Sync message is not received the synchronizer 
increments the clock value if the clock value is less than a 
resynchronization period (P), and resets the clock value and 
transmits a new Sync message to all directly connected nodes 
when the clock value equals or exceeds P. 

19 Claims, 3 Drawing Sheets 
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FAULT-TOLERANT SELF-STABILIZING 
DISTRIBUTED CLOCK SYNCHRONIZATION 
PROTOCOL FOR ARBITRARY DIGRAPHS 

CROSS-REFERENCE TO RELATED 5 

APPLICATIONS 

This application claims the benefit of U.S. Provisional 
Patent Application No. 61/442,826 filed on Feb. 15, 2011, 
which is hereby incorporated by reference in its entirety. ! 

STATEMENT REGARDING FEDERALLY 
SPONSORED RESEARCH OR DEVELOPMENT 

The present invention was made by an employee of the 15 
United States Government and may be manufactured and 
used by or for the Government of the United States for gov- 
ernmental purposes without the payment of any royalties 
thereon or therefor. 

20 

TECHNICAL FIELD 

The present disclosure relates to a fault-tolerant protocol 
and system for synchronizing local logical time clocks in an 
arbitrary, non-partitioned digraph. 25 

BACKGROUND 

Distributed systems typically require the accurate, coordi- 
nated timing of process steps and task sequences to facilitate 30 
overall event synchronization and data correlation. Even 
when initially set accurately, clocks used in the various 
devices of the distributed system will differ over time due to 
inherent clock drift. Each clock frequency source, typically a 
crystal oscillator, can run at slightly different rates. Error can 35 
thus accumulate over time. Operating environment, age, and 
other factors affect each physical clock somewhat differently, 
and thus can affect the rate of change and accumulated error 
within the distributed system as a whole. 

Clock synchronization algorithms are therefore essential 40 
for managing system resources and controlling communica- 
tion between nodes of the system. For proper clock synchro- 
nization, each node either accesses timing signals originating 
from a common time source, for instance global positioning 
satellite signals, or the nodes synchronize their individual 45 
local logical time clocks in a distributed way using knowl- 
edge from the other nodes. 

SUMMARY 

50 

A distributed clock synchronization method or protocol is 
disclosed herein, along with a distributed system that uses the 
presently disclosed protocol to achieve and maintain clock 
synchrony. The present approach provides a fault-tolerant 
solution for a network of K nodes in the form of an arbitrary, 55 
non-partitioned directed graph, i.e., a digraph. “True syn- 
chrony” is defined as operating and exchanging messages 
between system nodes in perfect unison, a process that is only 
possible under the strictest assumptions and under ideal con- 
ditions. “Bounded-synchrony”, on the other hand, is a more 60 
general term that encompasses certain imperfections in the 
network. Bounded-synchrony refers to the exchange of local 
time information by nodes of a network in unison but within 
a given bound. Thus, the term “synchrony” as used herein 
means “bounded-synchrony”. 65 

The networks/digraphs considered in the present disclo- 
sure range from fully-connected to 1 -connected networks of 
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nodes while also allowing for differences in the network 
elements. Example networks that may be synchronized via 
the presently disclosed protocol include grid, ring, fiilly-con- 
nected, bipartite, and star (hub). Other networks may be envi- 
sioned, and therefore this list of examples is non-limiting. 

The present protocol does not require a particular informa- 
tion flow, nor does it impose changes to the network in order 
to achieve the desired synchrony. The approach only consid- 
ers distributed systems in the absence of non-detectable 
faults. This departure from the Byzantine extreme of the fault 
spectrum is taken in part because of the niche use and extra 
cost associated with Byzantine faults. Also, using authenti- 
cation and error detection techniques it is possible to substan- 
tially reduce the effects of a variety of faults in the system. 

In particular, a self-stabilizing network is disclosed herein 
which includes K nodes . Each node communicates with other 
neighbor nodes, i.e., any nodes that are directly connected to 
each other, via the transmission or broadcast of low-overhead 
Sync messages as described in detail herein. The Sync mes- 
sage is the only type of message used to self-stabilize the 
network. Each of the K nodes includes a synchronizer such as 
but not limited to a state machine. K-l monitors in commu- 
nication with the synchronizer, a local physical oscillator/ 
physical clock, and a logical time clock. The logical time 
clock has a variable integer clock value that is represented 
herein as the clock value LocalTimer. The clock value Local- 
Timer can vary from 0 to a maximum allowable value of P as 
described herein. Such a logical time clock may be embodied 
as an integer counter. 

The logical time clock is in communication with the syn- 
chronizer, is driven by the local physical oscillator, and 
locally keeps track of the passage of clock time for a given 
node as the clock value LocalTimer. Each monitor in a given 
node can receive a Sync message transmitted by another node 
that is directly connected to or in direct communication with 
the node in which the monitor resides. 

The synchronizer continuously executes the present proto- 
col, with the term “continuously” as used herein meaning 
truly continuously in an analog embodiment and once per 
logical clock tick in a digital embodiment. Upon receiving a 
valid Sync message from one or more of the monitors, the 
synchronizer executes the steps of the present protocol in 
accordance with the results of certain threshold comparisons 
as set forth herein. 

An example self-stabilizing network in the form of an 
arbitrary, non-partitioned digraph, without using a central 
clock or a centrally generated signal, pulse, or message of any 
type for self-stabilization, includes K nodes configured to 
selectively transmit a Sync message. K at all times is at least 
1 . That is, as few as one node can rim the present protocol and 
operate properly, e.g., a given node may wake up before the 
others, or a network may temporarily downgrade to one active 
node, whether or not other nodes are present. The other nodes 
can integrate into the system/network by joining the only 
actively present node. Such a scenario is more prevalent in a 
dynamic network and also when the co mmu nication medium 
is not hard wired between the nodes. 

Upon commencing execution of the present protocol, the 
synchronizer checks the current clock value LocalTimer for 
its node. If the clock value LocalTimer is less than 0, the 
LocalTimer is reset, i.e., set equal to 0. 

When the clock value LocalTimer is greater than or equal 
to 0 and a valid Sync message has been received, appropriate 
steps are taken with respect to the value LocalTimer and/or 
transmitting of new Sync messages as set forth below. If the 
valid Sync message is not received, the clock value Local- 
Timer is compared to R If LocalTimer equals or exceeds P, 
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LocalTimer is reset and a new Sync message is transmitted to 
all nodes that are directly connected to the node in which the 
synchronizer resides. If LocalTimer is less than P in this 
comparison, LocalTimer is incremented. 

If a valid Sync message received, the synchronizer instead 5 
performs a set of threshold comparisons. First, if the clock 
value LocalTimer is less than a minimum event-response 
delay (D), the clock value LocalTimer is set to between 0 and 
a communication latency value (y) depending on the embodi- 
ment. That is, the value may be 0, y, or anything in between. 10 

When a valid Sync message is received and the clock value 
LocalTimer is greater or equal to both D and a calibrated 
graph threshold (T s ), the synchronizer still sets the clock 
value LocalTimer equal to between 0 and y , again depending ( . 
on the embodiment, but also transmits a new Sync message to 
all nodes that are directly connected to the node in which the 
synchronizer resides. 

In another embodiment, instead of setting the LocalTimer 
to between 0 and y, the clock value LocalTimer is set instead 20 
to the sum of an incoming LocalTimer, i.e., LocalTimerln, 
value plus y to compensate for the worst-case message delay. 

The above features and advantages and other features and 
advantages of the present invention are readily apparent from 
the following detailed description of the best modes for car- 25 
rying out the invention when taken in connection with the 
accompanying drawings. 


BRIEF DESCRIPTION OF THE DRAWINGS 

30 


FIG. 1 is a schematic illustration of an example distributed 
system having local logical time clocks that may be synchro- 
nized according to the present protocol. 

FIG. 2 is a time plot of the variable clock value, Local- 
Timer, of an example logical time clock. 35 

FIG. 3 is schematic illustration of Sync message flow 
between connected nodes in real time within an example 
network or digraph. 

FIG. 4 is a schematic block diagram of the Y h node, N ( , of 
an example self-stabilizing network or digraph. 40 

FIG. 5 is a flow chart illustrating one possible embodiment 
of the present protocol. 

FIG. 5A shows two alternative steps that can be used in a 
first variation of the protocol shown in FIG. 5. 

FIG. 5B shows three alternative steps that can be used in a 45 
second variation of the protocol shown in FIG. 5. 

FIG. 6 is a schematic block diagram of an example logic 
circuit for implementing the \ ,h node of FIG. 4 according to 
the protocol embodiment of FIG. 5. 


DETAILED DESCRIPTION 


The present invention is described herein with reference to 
the accompanying drawings. The invention, however, may be 
embodied in many different forms, and therefore should not 55 
be construed as being limited to the particular embodiments 
set forth herein. Further discussion of the present invention is 
provided in Mahyar R. Malekpour, “A Self-Stabilizing Syn- 
chronization Protocol for Arbitrary Digraphs”, NASA/TM- 
2011-217054, February 2011, Mahyar R. Malekpour, 60 
“Model Checking a Self-Stabilizing Distributed Clock Syn- 
chronization Protocol For Arbitrary Digraphs”, NASA/TM- 
20110217152, May 201 1, and Mahyar R. Malekpour, “Cor- 
rectness Proof for a Self-Stabilizing Distributed Clock 
Synchronization Protocol For Arbitrary Digraphs”. NASA/ 65 
TM-21 7 1 84, October 201 1 , all of which are hereby incorpo- 
rated by reference in their entireties. 


Referring to the drawings, wherein like reference numbers 
correspond to like or similar components throughout the sev- 
eral figures, an example distributed system 10 is shown in 
FIG. 1 that includes a plurality of networked devices 12, 16, 
and 20. For illustrative simplicity only three devices are 
shown in FIG. 1. However, any other plurality may be used 
with the present approach. 

Each of the networked devices 12, 16, and 20 includes a 
respective logical time clock 13, 17, and 21, a respective 
physical oscillator 14, 18, and 22, e.g., an oscillating crystal, 
a pacemaker cell, or any other oscillating device, and respec- 
tive logic circuit 15, 19, and 23 for implementing the present 
clock synchronization protocol. An example of this protocol 
is described below with reference to FIG. 5, with variations 
described with additional reference to FIGS. 5A and 5B. 

The networked devices 12, 16, and 20 of FIG. 1 form a 
system of pulse-coupled entities each pulsating at regular 
time intervals via their respective oscillators 14, 18, and 22. 
The devices 12, 16, and 20 are coupled through some physical 
connection 24, e.g., wires, fiber optic cables, a chemical pro- 
cess, etc., or wirelessly through air or a vacuum as indicated 
by waves 25. 

The underlying system 10 can be modeled as a network 11 
comprised of a set of communication nodes, for instance 
nodes 30A, 30B, 30C as shown in FIG. 3 and discussed below. 
The devices 12, 16, and 20 communicate with each other by 
exchanging Sync messages, e.g., 1 -bit messages in one par- 
ticularly low overhead embodiment, although other Sync 
messages such as 8-bit messages or 1 6-bit messages may be 
used. The broadcast or transmission of a Sync message by a 
given device 12, 16, 20 is realized by transmitting the Sync 
message at the same time to all devices/nodes that are directly 
connected to that node. This concept is described in further 
detail below with reference to the node diagram of FIG. 3. 

The various networked devices 12, 16, and 20 of FIG. 1 
execute instructions embodying the present protocol 100, an 
example of which is shown in FIG. 5, thereby providing a 
fault-tolerant method for self-stabilization and time synchro- 
nization within the di stributed system 1 0 . A “fault” is defined 
herein as a defect or flaw in a component resulting in an 
incorrect state. The present protocol 100 provides a solution 
for the synchronization of an arbitrary, non-partitioned net- 
work (digraph) in the absence of non-detectable faults. It 
tolerates well any detectable faults, and thus is fault-tolerant 
to this extent. The protocol also tolerates node and link drop- 
outs, i.e., failures, as long as the network stays faithful to the 
definition of the digraph. In other words, provided that the 
failure of nodes and/or links does not partition the digraph. 

Continuous execution of the protocol 100 of FIG. 5, such as 
once per clock tick in a digital embodiment, and its various 
alternative embodiments of FIGS. 5 A and 5B by the various 
devices, for instance the networked devices 12, 16, and 20 of 
FIG. 1. provides a self-stabilizing solution for a network in 
the form of an arbitrary, non-partitioned digraph. The distrib- 
uted system 10 of FIG. 1 is “self-stabilizing” if from an 
arbitrary state it is guaranteed to reach a “legitimate” state in 
a finite amount of time and remain in that legitimate state 
thereafter. The protocol 100 can self-stabilize from any initial 
state, i.e., it does not rely on assumptions about the initial state 
of the network other than the presence of at least one node, 
which in him may be anonymous. That is, a node may have no 
identifier at all, e.g., an IP address, or it may have an identifier 
that is not unique with respect to identifiers of other nodes 
used within the network. A legitimate state is defined as a state 
in which all parts in the distributed system 1 0 are in bounded 
synchrony. 
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The Logical Clock (LocalTimer) 

Referring to FIG. 2, the various logical time clocks 13, 17, 
and 21 of FIG. 1 are driven by the respective physical oscil- 
lators 14, 18, and 22 shown in the same Figure. Each logical 
time clock 13, 17, and 21 emits a respective local time signal 
over real time (t), with that signal referred to hereinafter as the 
clock value LocalTimer. In one embodiment, the value of 
LocalTimer may be an integer. If the LocalTimer is defined as 
an integer, it can take on +/- values, with negative values 
being rare, e.g., potentially occurring during power on and/or 
severe upset or malicious scenarios. The LocalTimer may not 
be an integer in another embodiment, but such an embodi- 
ment may not adequately prevent the worst case scenarios. 

An example trace 33 of the clock value LocalTimer is 
shown in FIG. 2. Trace 33 is a monotonic linear function 
increasing from an initial value, e.g., 0, to a calibrated maxi- 
mum value of P. As noted above, rare cases may occur in 
which the integer value is negative, and thus the protocol 
handles this possibility in a preliminary step as set forth 
below. 

If uninterrupted, i.e. when a given node does not receive 
any Sync messages from other directly connected nodes, the 
clock value LocalTimer for a given node periodically takes on 
integer values from an initial value to a maximum value of P, 
linearly increasing within each period as shown. That is, the 
clock value LocalTimer is typically bounded by 
OsLocalTimersP. 

Referring to FIG. 3, the distributed system 10 of FIG. 1 can 
be modeled as an example network 1 1 in the form of a digraph 
having a set of communications nodes 30A, 308, 30C, which 
are collectively referred to as the nodes 30. Communication 
between the nodes 30 occurs via transmission/broadcast of 
messages (arrows 32 A, 32B, 32C) over communication chan- 
nels as is well understood in the art, with the various commu- 
nications channels collectively representing the available 
connectivity within the distributed system 10 of FIG. 1. 

The underlying topology T is an arbitrary, non-partitioned 
digraph of K>1 nodes 30. The nodes 30 may be anonymous in 
that sense that they may lack a unique identity, even if some of 
the nodes have an identifier such as an IP address. All of the 
nodes 30 are considered to be good, that is, to actively par- 
ticipate in the synchronization process and to be able to cor- 
rectly execute the protocol 100 of FIG. 5 and its various 
embodiments as disclosed herein. 

As used herein, the term “source node” refers to a particu- 
lar node 3 0 from which a Sync message (arrows 32 A, 3 2B, or 
32C) originates. Likewise, the term “destination node” refers 
to a node 30 which receives a Sync message. Thus, a source 
node may also act as a destination node and vice versa. The 
communications channels, like the various nodes 30, are also 
assumed to be good, i.e., to reliably transfer data between 
source and destination nodes. As noted above, each of the 
nodes 30 communicates with other nodes 30 by transmitting 
messages (arrows 32A, 32B, or 32C) to any nodes 30 directly 
connected to that source node. For instance, in FIG. 3 node 
30 A, a source node, may transmit a Sync message (arrow 
32A) to nodes 30B and 30C, with the nodes 30B and 30C in 
this instance acting as destination nodes. 

Hie example network 11 of FIG. 3 does not guarantee a 
relative order of arrival of a given transmitted message at any 
particular receiving node. Additionally, as noted above the 
network 11 is characterized by an absence of a central system 
clock or any centrally-generated signal, pulse, or message of 
any kind at the network level, i.e., central with respect to the 
network or global with respect to a particular node 30 and its 
associated synchronizer 28 (see FIG. 4). The communica- 
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tions channels and nodes 30 can behave arbitrarily provided 
that, eventually, the network 11 adheres to various protocol 
assumptions noted below. 

Drift Rate (p) 

5 Each node 30 is driven by a respective independent, free- 
running local physical oscillator 14, 18, or 22 as shown in 
FIG. 1, whose phase is not controlled in any way, and by the 
corresponding logical clocks 13, 17, or 21. The logical clocks 
13, 17, and 21 of FIG. 1 locally track the passage of time for 
to their respective node. A single oscillator tick is a discrete 
value that forms the basic unit of time within the network 11 
of FIG. 3. 

An ideal oscillator lias zero drift rate, p, with respect to real 
time t, thus perfectly marking the passage of time. However, 
15 real oscillators are characterized by non-zero drift rates with 
respect to real time. The oscillators 14, 18, and 22 of the 
various nodes 30 shown in FIG. 3 are assumed to have a 
known bounded drift rate p which is a small constant with 
respect to real time, where p is a unitless, non-negative real 
20 value expressed as 0<p< 1 . 

The maximum drift of the fastest LocalTimer used in the 
network 11 of FIG. 3 over a time interval (t) is given by 
(1/(1 +p))t. Likewise, the maximum drift of the slowest Local- 
Timer over time interval (t) is given by (l/(l+p))t. Therefore, 
25 the maximum relative drift of the fastest and slowest nodes 30 
of FIG. 3 with respect to each other over a time interval (t) is 
given by: 

&(;)=(( l+p)-l/(l+p))f 

30 Communication Delay (D), Network Imprecision (d), and 
Latency (y) 

Still referring to FIG. 3, the communication latency (y) 
between adjacent nodes 30 is expressed in terms of the mini- 
mum event-response delay (D) and a measure of network 
35 imprecision (d). A Sync message (arrow 32A) transmitted at 
time t 0 is expected to arrive at all destination nodes, e.g. node 
30C, and to be processed there. Subsequent messages are 
generated in the interval [t 0 +D, t 0 +D+d]. 

Communication between independently clocked nodes 30 
40 is inherently imprecise. The network imprecision, d, is the 
maximum time difference among all receivers of a message 
from a transmitting node 30 with respect to real time. The 
network imprecision, d, is due to oscillator drift with respect 
to real time, jitter, discretization error, temperature effects, 
45 and differences in lengths of the physical communication 
media. The parameters d and D are assumed to be bounded 
such that Dal and d>0, and both have discrete values with 
units of a real time clock tick. The communication latency (y) 
is thus expressed in terms of D and d, and is constrained by: 
50 

Y =(D+d) 

The communication delay between any two adjacent nodes 
30 is constrained by [D, y]. 

Network Topology 

55 A communication link is an edge in the digraph represent- 
ing a direct physical connection between two nodes 30. A 
path is a logical connection between two nodes 30 consisting 
of one or more links. A path-length is the number of links 
connecting any two nodes. The general topology T consid- 
60 ered herein is a strongly connected digraph (e.g., network 11) 
consisting of K nodes 30, with K=3 in the example embodi- 
ment of FIG. 3. Each node 30 is connected to the network 11 
by at least one communications channel. There is a path from 
any given node 30 to every other node 30, and the communi- 
65 cations channels are either unidirectional or bidirectional. 
Furthermore, the present approach assumes there is no direct 
path from any node 30 back to itself, i.e., no self-loop, and 
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there are no multiple channels directly connecting any two 
nodes 30 in any one direction. This is the general framework 
within which the present protocol 100 of FIG. 5 and the 
alternative embodiments of FIGS. 5A and 5B operate. 

Two nodes 30 are said to be “adjacent” to each other if they 5 
are connected to each other via a direct communication link. 

L, an integer value, represents a number of links and denotes 
the largest loop in the graph, i.e., the maximum value of the 
longest path-lengths from a node 30 back to itself visiting the 
nodes 30 along the path only once, except for the first node to 
which is also the last. W, also an integer value representing a 
number of links, signifies the width or diameter of the graph, 
i.e., the maximum value of the shortest path connecting any 
two nodes. For digraphs of size K>1 . L and W are bounded by 
2<L<Kand 1<W<K-1. is 

The system 10 of FIG. 1 has two synchronization states: 
synchronized and unsynchronized. The system 10 is in the 
unsynchronized state when it starts up or when it is powered 
on. The synchronized state is entered when the nodes 30 of 
FIG. 3 are within an expected boundary precision. The dis- 20 
tributed system 10 transitions from the unsync hronized state 
to the synchronized state after execution of the present syn- 
chronization protocol 100 of FIG. 5. When the distributed 
system 10 of FIG. 1 reaches the synchronized state it is said to 
be in synchrony. 25 

Due to inherent drift in the local times, the present protocol 
100 of FIG. 5 is to be executed all the time to ensure that the 
local times remain synchronized, i.e., executed continuously 
in an analog implementation or once per local clock tick in a 
digital implementation. The rate of resynchronization is con- 30 
strained by physical parameters of the design, for instance 
drift rates of the physical oscillators 14, 18, and 22 of FIG. 1 
as well as precision and accuracy goals. The present approach 
addresses achieving and maintaining the precision goal of the 
distributed system 10 of FIG. 1. Therefore, the present pro- 35 
tocol enables the distributed system 10 to achieve and main- 
tain synchrony among distributed logical clocks 13, 17, and 
21 of FIG. 1 and not the physical oscillators 14, 18, and 22 
shown in the same Figure. 

The logical clocks 13, 17, and 21 of FIG. 1 are periodically 40 
synchronized by an exchange of Sync messages between 
directly connected nodes 30. That is, a given node 30 selec- 
tively transmits a Sync message only to other nodes 30 that 
are directly connected to it. The process of periodic and 
automatic synchronization after initial synchrony is achieved 45 
is referred to as resynchronization, whereby all nodes 30 
reengage in the disclosed synchronization process. A given 
node 30 is said to “time-out” when its logical clock 13, 17, or 
21 reaches a maximum value, i.e., the calibrated maximum 
value P. i.e., the resynchronization period, described above 50 
with reference to FIG. 2. 

The resynchronization process begins when the first node, 
herein defined as the fastest node, times-out and transmits a 
Sync message. The process ends after the last node, herein 
defined as the slowest node, transmits a Sync message. For a 55 
drift rate p«l , the fastest node cannot time-out again before 
the slowest node transmits a Sync message. A Sync message 
is transmitted either as a result of a resynchronization tim- 
eout, or when a node 30 receives a Sync message(s) indicative 
of other nodes 30 engaging in the resynchronization process. 60 
A node 30 is said to be interrupted when it accepts an incom- 
ing Sync message before its clock value LocalTimer reaches 
its maximum value, i.e., before it times-out. 

Synchronizer and Monitors 

Referring to FIG. 4, transmitted Sync messages from each 65 
node 30 are deposited on communication channels. Each 
node 30 includes a synchronizer 28, such as but not limited to 


8 

a state machine, and a plurality of monitors 29. To closely 
observe the behavior of other nodes, each node 30 employs, at 
least one monitor 29 and, at most, K-l monitors 29. One 
monitor 29 is employed for each source of incoming mes- 
sages, e.g., from directly connected nodes N , , N,_ t , N i+1 , and 
Nj.. Anode 30 neither uses nor monitors its own messages. 

Each monitor 29 keeps track of the activities of its corre- 
sponding source node(s). A monitor 29 detects proper 
sequence and timelines of the received messages from its 
corresponding source node, reads, evaluates, time stamps, 
validates, and stores only the last Sync message it receives 
from that particular node. Additionally, a monitor 29 ascer- 
tains the health condition of its corresponding source node by 
keeping track of the current state of that node. As the number 
of nodes K increases in the digraph, so does the number of 
monitors 29 in each node 30. The monitors 29 may be imple- 
mented as separate physical components from the nodes 3 0 or 
they may be logically implemented as part of the node func- 
tions. 

Upon conveying the valid Sync message to the local syn- 
chronizer 28, a given monitor 29 disposes of the valid Sync 
message after it has been kept for one local clock tick. The 
synchronizer 28 describes the behavior of the node, N„ uti- 
lizing assessment results from its monitors 29, where moni- 
tor ; , i*j, is the particular monitor for the corresponding node 

N, : 

A Sync message is transmitted to directly connected nodes 
either as a result of a resynchronization time-out or when a 
node 30 receives a valid Sync message(s) (arrows 32) indica- 
tive of other directly connected nodes 30 engaging in a resyn- 
chronization event. A node 30 periodically undergoes a resyn- 
chronization process either when its LocalTimer times out or 
when it receives a Sync message. If it times out, it broadcasts 
a Sync message (arrow 132 of FIG. 4) and so initiates a new 
round of the resynchronization process. 

However, since only detectable faults are assumed, i.e., 
F=0 where F is the maximum number of faulty nodes, when 
a node 30 receives a Sync message, except in a predefined 
ignore window bounded to [D, TS], it accepts the Sync mes- 
sage and undergoes the resynchronization process where it 
resets its clock value LocalTimer and relays the Sync message 
(arrow 132) to other directly connected nodes 30. This pro- 
cess continues until all of the nodes 30 participate in the 
resynchronization process and converge to a guaranteed pre- 
cision. The predefined window where the node 30 ignores all 
incoming Sync messages, i.e., the ignore window, provides a 
means for the protocol to stop the endless cycle of resynchro- 
nization processes triggered by the follow up Sync messages. 
Sync Message 

In order to achieve synchrony, the nodes 30 communicate 
by exchanging Sync messages with other directly comiected 
nodes as noted above. The Sync message is the only type of 
message used by the protocol to self-stabilize the digraph. 
When the system 10 of FIG. 1 is in synchrony, the protocol 
overhead is at most one Sync message per resynchronization 
period (P), where P has units of real time clock ticks and is 
defined as the upper bound on the time interval between any 
two consecutive resets of the clock value LocalTimer by a 
given node 30. Assuming physical-layer error detections are 
dealt with separately, the reception of a Sync message by any 
given node 30 is indicative of its validity in the value domain. 
The present protocol 100 of FIG. 5 and its embodiments of 
FIGS. 5 A and 5B thus perform as intended when the timing 
requirements of the messages from every node 30 are satis- 
fied. However, in the absence of non-detectable faults the 
reception of a Sync message is indicative of its validity in the 
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value and time domains. A valid Sync message is discarded 
after it is relayed to the synchronizer and has been kept for one 
local clock tick. 

Protocol 

The following protocol assumptions are made: (1) the 
number of nodes 30 is denoted by K, where K>1 ; (2) all nodes 
30 correctly execute the protocol; (3) all links correctly trans- 
mit data from their sources to their destinations; (4) T=a 
non-partitioned, strongly connected digraph; (5) 0<p«l ; (6) 
a Sync message sent by any given node 30 will be received 
and processed by all adjacent nodes 30 within the duration of 
y, where y=D+d; and (7) initial values of the variables of a 
node 30 are within their corresponding data-type range, 
although possibly with arbitrary values. In a physical implan- 
tation, it is expected that some local mechanism exists to 
enforce type consistency for all variables. 

The Distributed Self-Stabilizing Clock Synchronization 
Problem 

To simplify the present protocol 100 of FIG. 5 and its 
alternative embodiments discussed below, it is assumed that 
all time references are with respect to an initial real time t 0 , 
where t 0 =0 when the above listed protocol assumptions are 
satisfied, and for all t>t 0 the system 10 of FIG. 1 operates 
within the protocol assumptions noted above. 

The maximum difference in the value of the clock values 
LocalTimer for all pairs of nodes at time t, A Net (t), is deter- 
mined by the following equations that account for the varia- 
tions in the values of the clock value LocalTimer across all 
nodes: 

r=\{ fF+l)(Y+6(y)l, 

LocalTimer wi „(x)=min(A/ J -LocalTimer(A:)), and 

LocalTimer w ^ x (x)=max(A^ z -LocalTimer(x)), for all i. 

A A r e X^)=min((LocalTimer ra£i;c (?)-LocalTimer OTI>J (/)), 

(LocalTimer wajc (f-r)LocalTimer mz>J (/-r))), 

where: 

C is a bound on the maximum convergence time, wherein 
the protocol deterministically converges to synchrony 
within the time bound (C) as a linear or substantially 
linear function of P. While substantially non-linear func- 
tions are possible, such functions may result in a lack of 
determinism and/or difficulty of analysis; 

A Net (t), for real time t, is the maximum difference of values 
of the corresponding LocalTimer of any two nodes (i.e., 
the relative clock skew) for t>t 0 ; and 

it, the synchronization precision, is the guaranteed upper 
bound on A Net (1) for all t>C. 

There exists C and it such that the following self-stabiliza- 
tion properties hold: 

Convergence: A Net (C)<it, 0<it<P; 

Closure: for all t>C, A Net (t)<it; 

Congruence: for all nodes N,., for all taC,(N, LocalTimer 
(t)=Y) implies A Net (t)<it; and 

Liveness: for all t>C, the LocalTimer of every node 
sequentially takes on at least all integer values in [y, 
P-it], 

Self-Stabilizing Distributed Clock Synchronization Protocol 
for Arbitrary Digraphs 

Hie protocol 100 of FIG. 5 and its embodiments of FIGS. 
5A and 5B use a synchronizer 28 and a set of monitors 29 as 
shown in FIG. 4. both of which execute once every local clock 
tick. The following parameters apply when all links are bidi- 
rectional: 


z>(z+2Xy+&(y)) 

P*3T S , for p=0 

5 Pz 3 ( Ts+b(r s )), for L =K and p>0 

Psmax((2/f+l)(Y+8(Y)), 3(r^b(r s ))), for L=fiT) and 

p>0. 

The following is a list of protocol parameters for digraphs, 
to i.e., when at least one link is unidirectional: 

2>(J&2)(y+6(y)) 

P*K(T s +b(T s )) 

15 Regardless of the types of links in the network 11 of FIG. 3, 
the following is a list of protocol measures: 

C/„,-,=2/>+A-(Y+8(Y)) 

20 A^vsCAT- 1)(y+6(y)) 

c=Ci„,,+\ A M A]P 

Wd ^Ini,Ouaran,eefi ^(Y+8(Y))> f°r ^ «*C 

25 

JI=A i„„Guara n ^AYl-PteO for all tzC and OSJtS P. 

A trivial solution is when P=0. Since P>T 5 and the clock value 
LocalTimer is reset after reaching P (worst-case wrap- 
around), a trivial solution is not possible. 

30 Referring now to the example flow chart of FIG. 5, the 
protocol 100 is shown in one possible embodiment with 
respect to a particular node 30 of FIG. 3. Beginning at step 
102. the synchronizer 28 for a particular node 30 determines 
if the clock value LocalTimer at that particular node 30 has a 
35 value that is less than zero. As noted above, this condition in 
which the LocalTimer has a negative value should not ordi- 
narily be present, but step 1 02 is still provided for preventa- 
tive reasons and safety. The protocol 100 proceeds to step 103 
if LocalTimer is less than 0. Otherwise, the protocol 100 
40 proceeds to step 104. 

At step 103, the synchronizer 28 resets the clock value 
LocalTimer, i.e., sets the value of LocalTimer to zero, and 
then returns to step 102. Steps 102 and 103 are alternatively 
represented in pseudo-code below as the logic statement E0. 
45 At step 104, the protocol 100 proceeds by having the syn- 
chronizer 28 determine if a valid Sync message has been 
received at its node 30, referred to in this context as the 
receiving node. If a valid Sync message is received the pro- 
tocol 100 proceeds to step 105, and otherwise proceeds to step 
50 112. 

At step 105, the synchronizer 28, having received a valid 
Sync message at step 104, next determines whether the cur- 
rent clock value LocalTimer is less than the value of the 
minimum event response delay D. If so, the protocol 100 
55 proceeds to step 106. The protocol 100 proceeds instead to 
step 108 if LocalTimer is determined at step 105 to be greater 
than or equal to D. 

At step 106, the synchronizer 28 determines that an inter- 
ruption has occurred and sets the clock value LocalTimer 
60 equal to the value of the communication latency y. Thereafter, 
the protocol 100 returns to step 102. Steps 104, 105, and 106 
are alternatively represented in pseudo-code below as the 
logic statement El. 

At step 108, the synchronizer 28 determines whether the 
65 current clock value LocalTimer, assuming a valid Sync mes- 
sage is received at step 104, equals or exceeds the graph 
threshold T. If it does, the protocol 100 proceeds to step 110. 
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However, if the clock value LocalTimer is less than the graph Additional Discussion 
threshold the protocol 100 proceeds instead to step 112. From the expression for A fnjp the synchronization time C 

At step 110 the synchronizer 28 determines that an inter- and precision jt are functions of the network topology T and 

ruption has occurred and sets the clock value LocalTimer the drift rate p, specifically the graph’s width W and the 


equal to the communication latency y, and also transmits a 
Sync message to all other nodes directly connected to its node 
30. Thereafter, the protocol 100 returns to step 102. Steps 108 
and 110 are alternatively represented in pseudo-code below 
as the logic statement E2. 

At step 112, the synchronizer 28 determines whether 
LocalTimer equals or exceeds the LocalTimer’s maximum 
value, i.e., P. If so, the synchronizer 28 determines that the 
node 30 being evaluated has in fact timed out, and proceeds as 
a result to step 114. If the clock value LocalTimer is deter- 
mined to be less than P, the protocol 100 proceeds instead to 
step 116. 

At step 114 the synchronizer 28 resets its logical clock, i.e., 
LocalTimer=0, transmits a Sync message to all directly con- 
nected nodes as noted above, and returns to step 102. Steps 
112 and 114 are alternatively represented in pseudo-code 
below as the logic statement E3. 

At step 116, the synchronizer 28, having determined at step 
112 that LocalTimer is less than P, regardless of whether or 
not a valid Sync message is received at step 104, increments 
its corresponding clock value, i.e., LocalTimei=Local 
Timer+1, and returns to step 102. Step 116 is alternatively 
represented in pseudo-code below as the logic statement E4. 

In the protocol 100, if Sync message(s) arrive and either of 
the conditions of steps 105 or 108 are true, then the Local- 
Timer for that node does not get incremented. 

Referring briefly to FIG. 6, the embodiment of the protocol 
100 shown in FIG. 5 may be physically embodied as a logic- 
circuit 1 5, for instance residing in the networked devices 12 of 
FIG. 1, with similar logic circuits 19 and 23 residing in the 
other devices 16 and 20 of the same Figure. A monitor 29 
determines whether a valid Sync message is generated by its 
monitored node as noted above, and feeds this information 
into a set of logic gates embodying portions of the synchro- 
nizer 28 of FIG. 4. Various logic blocks 42, 44, 46 process the 
indicated comparative steps, e.g., whether the value from the 
clock value LocalTimer is less than D in block 44, greater than 
or equal to T s in block 42, or greater than or equal to P in block 
46. 

The logical time clock 13 may be embodied as a type of 
flip-flop as shown, receiving an oscillator signal 41 from its 
local physical oscillator (OSC) 14 and outputting its local 
clock signal as the clock value LocalTimer value used for all 
comparison steps of the protocol 100 of FIG. 5. Other logical 
embodiments may be used to encode the required logic set 
forth in FIG. 5 without departing from the intended inventive 
scope. 

Pseudo-code as noted above in the description of the flow 
chart of FIG. 5 may be readily envisioned as a series of logic 
statements E0-E4, with corresponding comments denoted by 


E0: if (LocalTimer < 0) 

LocalTimer := 0 

El : elseif (ValidSync( ) and (LocalTimer < D)) 
LocalTimer := y, //interrupted 

E2: elseif (ValidSync( ) and (LocalTimer a Tg)) 
LocalTimer := y, //interrupted 

Transmit Sync, 

E3 : elseif (LocalTimer aP) //timed out 

LocalTimer := 0. 

Transmit Sync, 

E4: else 

LocalTimer := LocalTimer + 1 


5 amount of drift the network experiences. In other words, 
C=f(W,8(P)) and jt=f(W,8(P)). 

From the expression for A Init and A InitGuaranteed it follows 
that for networks with small W values, k lnitGuaranteed occurs 
instantaneously, but for networks with large W values 
10 ^InitGuaranteed is a gradual process. The general equation for 
A i„ it applies to the ideal (p=0, d=0) and semi-ideal (p=0, d>0) 
scenarios. For these scenarios, A 7 „ ir <Wy. 

Although the initial (coarse) synchrony, A 7mt , occurs 
within C InW the initial guaranteed precision, A 7 ,„ 7Gua ,. a „ teerf , 
15 takes place after a number of periods and after achieving the 
initial synchrony. The general equation for Jt applied to the 
ideal and semi-ideal scenarios. Since A 7 „ lYGMaraKtee /=f(W,8 
(P)), for large values of PA ; ,,, G „ Kci fA M and no improve- 
ment on A 7wj7 is achievable. However, since typically 0<p«l , 
20 for small values of 8(P), A InitGuaranteed <A InU and improve- 
ment on A Ini , Guaranteed is POSSibk. 

In particular, for the ideal and semi-ideal scenarios, subse- 
quent resynchronization processes beyond the initial syn- 
chrony results in tighter precision. Specifically, for C C 7ra f + 
25 [ A 7 „,/y 1P, for the ideal scenario the result is A 7mtG „ ara „ fee /=0 
and jt=0, while for the semi-ideal scenario. 

Guaranteed & ^d 7t=Wd. Therefore, k I „ itGu arameed=Q- 

Wd, and W(y+8(y)) for the ideal, semi-ideal, and realizable 
systems (p>0,da0), respectively. 

30 After synchrony for the ideal scenario, the nodes periodi- 
cally pulsate in perfect unison (true synchrony). For the semi- 
ideal scenario, even in the absence of drift, the system’s 
behavior resembles a ripple effect where the nodes remain at 
most d apart from each other with the leading node as the 
35 center and originator of the ripple. Also, for realizable sys- 
tems due to the effect of drift, the system’s behavior 
resembles a ripple effect. However, when the nodes periodi- 
cally pulsate, depending on the amount of drift, the nodes 
remain at most one duration y apart from each other with the 
40 leading node as the center and originator of the ripple. 

Recall that jt=f(W,8(y)) and C=f(W,8(y)). Therefore, 
depending on the values of W and 8(y) the precision of the 
network and convergence time may be quite large. From the 
expression for 7 it follows that for networks with small W 
45 values synchronization occurs instantaneously with optimal 
precision, while for networks with large W values synchro- 
nization is a gradual process with larger precision. For 
instance, fora Hilly connected graph, W=l, Jt=d+8(y) is at its 
minimum with minimal dependence on the drift, and the 
50 convergence time is at its minimum value of C=C 7m; , whereas 
for the linear graph, W=K-1, jt is at its maximum and more 
dependent on the drift, and the convergence time is at its 
maximum value of C. Indeed, for the worst case where drift is 
very high, no improvement on A Init is possible no matter how 
55 much time passes . So, to achieve a desired precision one must 
reduce W, 8(P), or both. 

To reduce W, new links may be added to the graph such that 
the graph width W is halved and its precision doubled. This 
implies that the number of links (edges) to be added, E, is 
60 given by: 

£af logjA^,.,] 

More accurate oscillators are needed to reduce drift. However 
more accurate oscillators are more costly. Sometimes a graph 
65 cannot or should not be modified by adding new links, and as 
there are no perfect oscillators, drift may not be improved 
beyond a certain limit. Thus, other ways for achieving syn- 
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chrony faster and with more accurate precision are now dis- 
cussed along with variations of the protocol 100 of FIG. 5. 
Variations of the Protocol 
Variation #1: Time Shift 

In the “if’ expressions for El, E2, and E3 in the above 5 
pseudo-code, one can potentially add or subtract a value from 
the right hand side of the comparisons when comparing with 
LocalTimer. In other words, El can be written as: 


elseif (ValidSync( ) and (LocalTimer < (D ± X)), 
and E2 as: 

elseif (ValidSync( ) and (LocalTimer a (T 5 ± X)) 
and E3 as: 

elseif (LocalTimer a (P ± X)) 


10 


15 


with X being the same value for all El, E2, and E3 expres- 
sions and for some X>0. Of particular interest is when X>D. 

In this case, and in conjunction with E0, El will not be needed 
and can be eliminated and thus result in further simplification 20 
of the protocol. 

Variation #2: Reset 

One of the key elements of the present protocol 100 of FIG. 

5 is the proper setting of the clock value LocalTimer upon 
receiving a broadcast Sync message from a directly con- 25 
nected node. The clock value LocalTimer is set to the com- 
munication latency y in the embodiment of the protocol 
shown in FIG. 5. Thus, when a node 30 times out, it resets its 
clock value LocalTimer, i.e., LocalTimer=0, and after one 
duration y the transmitting and receiving nodes would natu- 30 
rally be in relative synchrony of, at most, d clock ticks front 
each other. If the clock value LocalTimer is set to D, the 
protocol 100 behaves similarly but with lower precision. As 
noted below, setting the clock value LocalTimer to any value 
less than y produces lower precision than setting it to the 35 
latency y. 

Setting the clock value LocalTimer to other values may not 
produce the desired effect. On the other hand, if a node gets 
interrupted the receiving nodes have no knowledge of the 
broadcasting node’s LocalTimer value, which could be either 40 
0 or y. The clock value LocalTimer is set to y upon interrupt as 
noted above. However, it could be assigned other values equal 
to or greater than 0. An arbitrary value is not going to produce 
the desired synchrony, but if the value of the broadcasting 
node’s LocalTimer is forwarded, then the clock value Local- 45 
Timer of the receiving node could be set to that value, offset 
by y, and once again the two nodes would be in relative 
synchrony. 

In this variation the clock value LocalTimer is reset, i.e., 
LocalTimer=0, upon receiving a Sync message rather than 50 
setting the LocalTimer to y as in steps 106 and 110 of FIG. 5. 

Referring briefly to FIG. 5A, steps 106 and 110 of FIG. 5 
are therefore simply replaced by alternative steps 206 and 
210, with all other steps of the protocol appearing as in FIG. 

5. Thus, FIG. 5A is to be read in conjimction with FIG. 5. 55 

Step 206 of FIG. 5A includes having the synchronizer 28 
for a particular node determine that an interruption has 
occurred and resetting the clock value LocalTimer. Thereaf- 
ter, the protocol returns to step 102 of FIG. 5. 

Likewise, at step 210 the synchronizer 28 determines that 60 
an interruption has occurred. Here, the synchronizer 28 resets 
the clock value LocalTimer and also transmits a Sync mes- 
sage to all other nodes that are directly connected to the node 
of the synchronizer 28 acting at step 210. Thereafter, the 
protocol 100 returns to step 102 of FIG. 5. 65 

This variation also synchronizes the network for p>0 and 
d>0 with the same A 7 ,„ p i.e., A /; „ r <(K-l)(y+0.5(y)). Also, 


when p=0 and d=0, unlike the protocol 100 of FIG. 5 where 

A / -„i, Guaranteed^’ KntGu a ra,„e e d=^ ■ Settin g tlle cl ° ck Value 

LocalTimer to other values between 0 and y would produce 
similar results as the protocol 100 of FIG. 5 and this variation 
with 0<^ IrlitGuarantexd <Wy. Since & InitGuara „ teed =Wy in this 
variation, even in the absence of drift the system’s behavior 
resembles a ripple effect where nodes remain at most y apart 
from each other with the leading node as the center and 
originator of the ripple. 

Pseudo-code for this variation is as follows: 


E0: if (LocalTimer < 0) 

LocalTimer :=0 

El : elseif (ValidSync( ) and (LocalTimer < D)) 
LocalTimer := 0, 

E2: elseif ((ValidSync( ) and (LocalTimer a T$)) 
LocalTimer := 0, 

Transmit Sync, 

E3 : elseif (LocalTimer &P) //timed out 
LocalTimer := 0. 

Transmit Sync, 

E4: else 

LocalTimer := LocalTimer + 1 


Variation #3: Jump Ahead 

In this variation, the current value LocalTimer is transmit- 
ted along with the Sync message. Referring briefly to FIG. 
SB, steps 106, 110, and 114 of FIG. 5 are simply replaced by 
alternative steps 306, 310, and 314. Step 306 entails deter- 
mining via the synchronizer 28 that an interruption has 
occurred and setting the clock value LocalTimer equal to the 
stun of the incoming LocalTimer value from the transmitting 
node, LocalTimerln, plus the communication latency y, i.e., 
LocalTimer=LocalTimerIn+y. Thereafter, the protocol 
remrns to step 102 as explained above according to FIG. 5. 

Likewise, at step 310 the synchronizer 28 determines that 
an interruption has occurred, and sets the clock value Local- 
Timer equal to the sum of the incoming value of the Local- 
Timer from the transmitting node, i.e., LocalTimerln, plus the 
communication latency y, i.e., LocalTimerln+y, and also 
transmits a Sync message and the clock value LocalTimer to 
all other nodes directly connected to that node. Thereafter, the 
protocol 100 returns to step 102 as shown in FIG. 5. 

Step 314 entails resetting the clock value LocalTimer and 
transmitting a Sync message and the clock value LocalTimer 
to all nodes directly connected to that node. Thereafter, the 
protocol 100 returns to step 102. 

This variation introduces more overhead due to the trans- 
mission of the LocalTimer value, but synchronizes the net- 
work for p>0 and d>0 with the same initial precision. In other 
words, A Jmi s(K-l)(y+fl(y)). However, the variation produces 
tighter initial guaranteed precision for the same convergence 
time, i.e., h, /ni , Guaranteed (1 +d)6(P) and C=C /m ,+[A / „ 1 /y'|P. 

This variation also requires greater number of exchanges of 
Sync messages during the convergence process. The excess 
transmission of Sync messages is due to the burst of relays of 
Sync messages prior to the convergence. Note that since after 
receiving a Sync message the clock value LocalTimer of a 
node gets incremented, all messages will eventually die out 
when the clock value LocalTimer of a node reaches or 
exceeds its maximum value of P. In the protocol 100 of FIG. 
5, by setting the clock value LocalTimer of a node to y that 
node immediately enters the ignore window, a time interval 
where it ignores all incoming Sync messages. In this varia- 
tion, however, depending on the initial value of the clock 
value LocalTimer of a given node, a message may not get 
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ignored until eventually the clock value LocalTimer of a node 
reaches or exceeds its maximum value of P and then enters the 
ignore window. 

Also, due to an interrupt the slowest nodes may never get 
set to a y during a resynchronization process even when the 
system is in synchrony. As a result (Theorem Congruence), 
for t>C the nodes are in synchrony when N ; . LocalTimer(t)= 
Wy. In the original protocol 100 of FIG. 5, for all t C Local- 
Timer of every node sequentially takes on at least all integer 
values in [y, P-Jt]. However, for this variation the minimum 
range of values is [Wy, P-Jt]. 

Pseudo-code for this alternative embodiment of the proto- 
col 100 of FIG. 5 may be readily envisioned as a series of logic 
statements E0-E4: 


E0: if (LocalTimer < 0) 

LocalTimer :=0 

El : elseif (ValidSync( ) and (LocalTimer < D)) 

LocalTimer := LocalTimerln + y, //interrupted 
E2: elseif ((ValidSync( ) and (LocalTimer a T 5 )) 
LocalTimer := LocalTimerln + y, //interrupted 
Transmit Sync and LocalTimer, 

E3 : elseif (LocalTimer aP) //timed out 
LocalTimer := 0. 

Transmit Sync and LocalTimer, 

E4: else 

LocalTimer := LocalTimer + 1 


Digraphs and Dynamic Graphs 

As noted above, the general form of the distributed syn- 
chronization problem (S) is defined by the following sep- 
tuple: 

S=(K,T,D,d,p,P,F), 

i.e., the number of nodes (K), network topology (T), event- 
response delay (D), communication imprecision (d), oscilla- 
tor drift rate (p), synchronization period (P), and number of 
faults (F), respectively. The most general form of the 
problem (S) may be described by the following septuple: 

S'=/K(t),T(t)J),d,p,P,F). 

where K(t) represents the dynamic node count and T(t) rep- 
resents the dynamic topology for a given K(t). In a dynamic 
node count the number of nodes comprising the network can 
change at any given time, and the presented protocol and its 
variations are readily applicable to this scenario provided the 
new nodes enter the network from a reset state where they are 
clear of all residual effects. The dynamic topology allows for 
topologies with any combination of unidirectional and bidi- 
rectional links as described above, whether static or dynamic. 
That is, for a given K(t) the number of links can change at any 
time. 

While the best modes for carrying out the invention have 
been described in detail, those familiar with the art to which 
this invention relates will recognize various alternative 
designs and embodiments for practicing the invention within 
the scope of the appended claims. 

The invention claimed is: 

1. A self-stabilizing network comprising: 
a node that includes: 
a synchronizer; 

a set of monitors in communication with the synchro- 
nizer, wherein each monitor in the set of monitors is 
configured to receive a transmitted Sync message, and 
wherein the number of monitors in the set of monitors 
is no more than one fewer than the number of nodes in 
the network; 
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a physical oscillator; and 

a logical time clock driven by the physical oscillator and 
having a variable clock value that locally tracks the 
passage of clock time for the node; 

5 wherein the synchronizer, without using a global clock or a 
globally-generated signal, globally-generated pulse, or 
globally-generated message of any kind for self-stabili- 
zation, executes a predetermined protocol to thereby: 
set the clock value equal to 0 when the clock value is less 
10 than 0; 

set the clock value equal to between 0 and a communi- 
cation latency value (y) when the Sync message is 
received by the synchronizer and the clock value is 
15 less than a minimum event -response delay (D); 

set the clock value equal to between 0 and y and transmit 
a new Sync message when: 

the Sync message is received by the synchronizer; and 
the clock value is greater than or equal to both D and 
20 to a calibrated graph threshold (T s ); 

set the clock value to 0 and transmit a new Sync message 
when the Sync message is not received by the syn- 
chronizer and the clock value is greater than or equal 
to a calibrated resynchronization period (P); and 
25 increment the clock value when the Sync message is not 
received by the synchronizer and the clock value is 
less than P; 

wherein the network is an arbitrary, non-partitioned 
digraph that is self-stabilizing, via execution of the 
30 protocol, from any initial state, and wherein the syn- 

chronizer transmits the Sync message to as many 
other nodes in the network as are directly connected to 
the first node. 

2 . The network of claim 1 , wherein the node comprises one 
35 of a plurality of nodes, and wherein the synchronizer trans- 
mits the new Sync message to any of the plurality of nodes 
that are directly connected to the node. 

3 . The network of claim 1 . wherein the Sync message is the 
only type of message used by the protocol to self-stabilize the 

40 digraph. 

4. The network of claim 1, wherein the protocol determin- 
istically converges to synchrony within a time bound (C) that 
is a substantially linear function of P. 

5 . The network of claim 1 , wherein the Sync message 
45 comprises a 1 -bit message. 

6 . The network of claim 1, wherein the synchronizer 
ignores all Sync messages that the synchronizer receives 
within a calibrated ignore window [D, T s ]. 

7. The network of claim 1, wherein at least one of the nodes 
50 is anonymous. 

8. The network of claim 1, wherein the synchronizer sets 
the clock value equal to y when the Sync message is received 
and the clock value is less than D. 

9. The network of claim 1, wherein the synchronizer resets 
55 the clock value when the Sync message is received and the 

clock value is less than D. 

10 . The network of claim 1 , wherein the synchronizer sets 
the clock value equal to y and transmits the new Sync message 
to the as many of the other nodes in the network as are directly 

60 connected to the transmitting node when the Sync message is 
received and the clock value is greater than or equal to both D 
and T s . 

11 . The network of claim 1 , wherein the synchronizer 
resets the clock value and transmits the new Sync message to 

65 the as many of the other nodes that are directly connected to 
the transmitting node when the Sync message is received and 
the clock value is greater than or equal to both D and T. 
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12. A self-stabilizing network comprising a plurality (K) of 
nodes in communication with each other, wherein each of the 
nodes includes: 

a synchronizer; 

a set of no more than K- 1 monitors in communication with 5 
the synchronizer, wherein each monitor in the set of 
monitors is configured to receive a transmitted Sync 
message and an incoming clock value from another of 
the nodes; 

a physical oscillator; and 1° 

a logical time clock that is in communication with the 
synchronizer and driven by the physical oscillator, 
wherein the logical time clock locally keeps track of the 
passage of time in a node of the synchronizer as a vari- 
able integer clock value; 15 

wherein the synchronizer, without using a global clock or a 
globally-generated signal, globally-generated pulse, or 
globally-generated message of any kind for self-stabili- 
zation, executes a predetermined protocol that includes: 
when the clock value is less than 0: 20 

resetting the clock value; 

when a Sync message is received by the synchronizer 
and the clock value is less than a minimum event- 
response delay (D): 

setting tlie clock value equal to the sum of a commu- 25 
nication latency value (y) and the incoming clock 
value; 

when the Sync message is received by the synchronizer 
and the clock value is greater than or equal to both D 
and a graph threshold (Ty): 30 

setting the clock value equal to the sum of y and the 
incoming clock value as an updated clock value; 
and 

transmitting a new Sync message and the updated 
clock value to as many of the other nodes as are 35 
directly connected to the corresponding node; 
when the clock value is less than a calibrated resynchro- 
nization period (P) and the Sync message is not 
received: 

incrementing the clock value; and 40 

when the clock value is greater than or equal to P: 
setting the clock value to 0; and 

transmitting a new Sync message and the clock 
value of 0 to the as many of the other K nodes that 
are directly connected to the corresponding 4 - 
node; 

wherein: 

the network is an arbitrary, non-partitioned digraph; and 
the Sync message is the only type of message that is used 
by the protocol to self-stabilize the digraph. 50 

13. The network of claim 12, wherein the protocol deter- 
ministically converges within a time bound (C) that is a linear 
fimction of P. 
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14. The network of claim 12, wherein at least one of the 
nodes is anonymous. 

15. Hie network of claim 12, wherein each monitor dis- 
poses of all previously received Sync messages after one tick 
of the logical time clock. 

16. The network of claim 12, wherein the synchronizer 
ignores all Sync messages received within a calibrated ignore 
window [D, T s |. 

17. A method for self-stabilizing an arbitrary, non-parti- 
tioned digraph of K nodes each including a synchronizer, no 
more than K-l monitors per node each in communication 
with the synchronizer, a physical oscillator, and a logical time 
clock in communication with the synchronizer that lias a 
variable integer clock value, is driven by the oscillator, and 
locally keeps track of the passage of time as the clock value, 
the method comprising: 

setting the clock value for a first node of the K nodes to 0 
when the clock value for the first node is less than 0; 

when a Sync message has been received at the first node: 
comparing the clock value for the first node to a mini- 
mum event-response delay (D); 
setting the clock value for the first node equal to between 
0 and a communication latency value (y) when the 
clock value for the first node is less than D; and 
setting the clock value for the first node equal to between 
0 and y and transmitting a new Sync message to as 
many of the K nodes as are directly connected to the 
first node when the clock value for the first node is 
greater than or equal to both D and a calibrated graph 
threshold (T s ); 

when the Sync message has not been received by the first 
node: 

comparing the clock value for the first node to a cali- 
brated resynchronization period (P); 
incrementing the clock value for the first node when the 
clock value for the first node is less than P; and 
setting the clock value for the first node to 0 and trans- 
mitting a new Sync message to as many of the K nodes 
as are directly connected to the first node when the 
clock value for the first node is greater than or equal to 
P; 

wherein the method deterministically converges to synchrony 
within a time bound (C) that is a substantially linear function 
of P, and is executed without using a global clock or a glo- 
bally-generated signal, globally-generated pulse, or globally- 
generated message of any kind for self-stabilization. 

18. The method of claim 17, further comprising disposing 
of a received Sync message after one tick of the logical time 
clock for the first node. 

19. The method of claim 17, further comprising: 

ignoring all transmitted Sync messages received within a 

calibrated ignore window [D, T s ]. 





